Translocating proteins compartment-specifically alter the fate of epithelial-mesenchymal transition in a compartmentalized Boolean network model

Regulation of translocating proteins is crucial in defining cellular behaviour. Epithelial-mesenchymal transition (EMT) is important in cellular processes, such as cancer progression. Several orchestrators of EMT, such as key transcription factors, are known to translocate. We show that translocating proteins become enriched in EMT-signalling. To simulate the compartment-specific functions of translocating proteins we created a compartmentalized Boolean network model. This model successfully reproduced known biological traits of EMT and as a novel feature it also captured organelle-specific functions of proteins. Our results predicted that glycogen synthase kinase-3 beta (GSK3B) compartment-specifically alters the fate of EMT, amongst others the activation of nuclear GSK3B halts transforming growth factor beta-1 (TGFB) induced EMT. Moreover, our results recapitulated that the nuclear activation of glioma associated oncogene transcription factors (GLI) is needed to achieve a complete EMT. Compartmentalized network models will be useful to uncover novel control mechanisms of biological processes. Our algorithmic procedures can be automatically rerun on the https://translocaboole.linkgroup.hu website, which provides a framework for similar future studies.


Supplementary Note 1: Three stable motifs explain the emergence of hybrid attractors
We analysed our compartmentalized Boolean EMT model from the perspective of stable motif succession, stabilization and their role in locking in the dynamics into different attractors. Stable motifs (SMs) are generalized positive feedback loops, which represent sets of self-sustaining nodestates, which once activated permanently lock into those node-states 1 . A more generalized form of stable motifs are conditionally stable motifs (CSMs), whose self-sustaining quality is contingent on external node states 2 . If the conditions of a CSM are locked in (e.g. in an upstream stable motif) they act as stable motif themselves, since their conditions are permanently satisfied. Because of this, and for the sake of brevity, we will generally refer to CSMs as stable motifs, only highlighting their conditional character if it is relevant.
Stable motifs and their successive stabilization determine the different irreversible pathways that will end up in attractors. Moreover, as we will see, stable motifs are key in understanding the emergence of the different attractors, including the hybrid states.
Supplementary Table 2  all epithelial (blue) motifs and they all inexorably lead to the mesenchymal attractor. Most important, however, in understanding the emergence of the hybrid attractors are the stable motifs highlighted with green (2,4,13). These stable motifs are consistent with all the mesenchymal motifs (orange) but are inconsistent with only some of the epithelial motifs (blue). This means that they can simultaneously stabilize with the consistent epithelial motifs and create attractors that are neither fully epithelial nor fully mesenchymal. For example, following the stabilization going from the top from SM with the label 5 to SM 14 the system has a choice: it can next stabilize SM 2 (Bcatenin_memb= 0, Ecadherin= 0) or SM 12 (Bcatenin_memb=1, Ecadherin=1). Neither SM 2 or SM 12 are inconsistent with the upstream motifs (SM 14 and SM 5) but they are inconsistent with each other, thus they generate (at least) two different attractors. The possible combinations of the hybrid motifs with the epithelial motifs allow for the creation of the five hybrid attractors, beside the E and M states. This insight is crucial for further future validation of the model. For instance if any of the hybrid attractors are found to be biologically implausible, the primary target of our inquiry into removing that attractor would be in re-evaluating the rules making up the hybrid motifs. An example of this procedure is already implicitly implemented in this model, as compared to the 19 node version of the EMT model from Steinway et al. 3 In that model SNAI2=1 formed an additional hybrid motif, leading to multiple extra hybrid attractors. Due to the compartmentalization implemented in our compartmentalized EMT model, neither the nuclear nor the cytoplasmic version of SNAI2 forms a hybrid motif, thus making the predictions of the current model more parsimonious.
In the case when hybrid attractors are biologically plausible, but represent unhealthy states, understanding the possible bifurcations of the system from the healthy path to an unhealthy one (see the example above) can be critical in designing successful interventions. Figure 2 all the stabilization paths that lead to the epithelial attractor are highlighted in blue. Every other path leads to either a hybrid or the mesenchymal attractor.

On Supplementary
The stable motif analysis was conducted with the methodology described by Rozum et al. 4 with the software package available at: https://github.com/jcrozum/StableMotifs

Supplementary Note 2: Stability of attractors
The results of the attractor stability analysis (summarized in Supplementary Table 3) show that the mesenchymal attractor is by far the most dominant attractor in both models (over 94% probability).
In the case of our compartmentalized EMT model the mesenchymal state is marginally less stable and the epithelial attractor has a fourfold increase in probability. Nonetheless the contrast between the two extreme states is still 4 orders of magnitude.
This contrast in stability is further emphasized by the fact that the (condition-free) stable motifs of the epithelial attractor, 5 and 6, are much more complex, and thus more difficult to stabilize as compared to the stable motifs of the mesenchymal attractor (0,1,2 and 4). Due to this, non-locked states are much more likely to be locked into the mesenchymal attractor by noise or targeted interventions.
We confirmed this conclusion using stable motif control theory 4,5 , which finds sets of interventions which can drive the system from any state into a desired attractor. The control sets of the mesenchymal attractor consist of single node interventions while the control sets of the epithelial attractor consist of sets of at least five simultaneous interventions. We note here that the minimum five simultaneous interventions in the case of our compartmentalized model is an improvement compared to the control sets of minimum 7 interventions in the Steinway et al. 3 model.

Supplementary Note 3: Effects of the involvement of the LIF/KLF4 pathway on our compartmentalized model
To test the robustness of our results we examined our results using a similar modelling approach published by Font-Clos et al. 6 In their work Font-Clos et al. 6 enriched the original EMT model of Steinway et al. 3 with the LIF/KLF4 pathway. They have added 4 nodes and 9 edges to the system and removed 1 edge from the network. In our work we used the reduced network of Steinway et al. 3 with only 19 nodes, thus we needed to simplify the LIF/KLF4 pathway similarly. This resulted in the addition of the KLF4 node (which is inhibited by ZEB2 and activates E-cadherin) to this reduced network.
Rerunning our simulations after the addition of the KLF4 node as discussed above showed, that the compartmentalized functions of GSK3B and GLI are still present, and there is no notable difference in the control sets or the attractor landscape (and the probability of different attractor states). These gave an important, additional proof to the robustness of our method. Notably the knock-in of the GSK3B_nuc node resulted in a partial MET as the E-cadherin node became activated (even during TGFBR activation). This indicates that the inclusion of the LIF/KLF4 pathway is important in larger-scale studies and could play a role in balancing the epithelial and mesenchymal attractors.

Supplementary Note 4: Constraints of the Gene Ontology-based functional characterization of our compartmentalized model
The results of the Gene Ontology based functional analysis show (Supplementary Figure 6), that during EMT we could observe an activation of the "epithelial to mesenchymal transition" process as well as the activation of "cell motility" as expected. These observations are in concert with known biological observations and they are important to functionally characterize our model.
On the other hand, this analysis couldn't recapitulate the fact that cell-cell and cell-matrix adhesion weaken during EMT. There was a marked decrease at the middle of the transition but almost immediately these functions were activated again. These contradictory results stemmed from the fact that our network is compartmentalized but the GO database is not. The GO database annotates proteins with biological terms, this neglects the fact that proteins are often functioning compartment-specifically. In our model this results in a false readout, as an example the activation of Ecadherin_CTF still contributes to the "epithelial cell-cell adhesion" activity which is invalid from a biological perspective. Thus to appropriately characterize the functional changes of a compartmentalized model future studies must use a compartmentalized functional readout which is not available yet.

Supplementary Note 5: Correlation of node compartmentalization with protein mass and function
In our compartmentalized EMT model there are divided nodes that represent translocating proteins according to subcellular locations (see Fig. 2 of the main text). An important difference of our model to the original EMT model is that in our case the nodes that represent one protein do not always mutually inhibit each other. This results in the possibility, that one protein has simultaneous activity in different organelles. One could argue that this breaks the law of mass conservation, as simultaneous activation in more than one organelle assumes increased protein mass. In Boolean models the base hypothesis is that there's a threshold above which there's an observable function, but this threshold may differ in different organelles. One underlying reason is that the functionality of a protein is not always proportionate with its concentration. If a protein translocates to an organelle in which there are interaction partners of that given protein with high affinity, then a relatively small amount of translocated proteins can have functionality and the original function in the other organelle may be preserved. Also the volume of organelles varies greatly e.g.: the cytoplasm is relatively large while the nucleus or mitochondria are notably smaller not to mention biomolecular condensates 7,8 . A flow of a small amount of proteins from a large organelle to a smaller is enough to reach locally high concentration levels. Moreover, in biomolecular systems the protein mass is dynamically regulated, proteins may be degraded or synthesized, all these also 8 result in the alteration of protein mass. Thus the simultaneously active Boolean state of a protein in more organelles does not by any means implies increased overall protein mass. The right attractor from the middle two is more mesenchymal with no E-cadherin expression, but the simulations only resulted in this attractor in 12% of the cases.

Supplementary Figure 6: Activity of 5 GO biological processes during EMT
Here we show the activity of 5 GO biological processes during EMT. There are 11 single node perturbations that led to EMT, here we show the average activation of these GO terms during these 11 perturbations. The figure shows the activation the "epithelial to mesenchymal transition" and "cell motility" processes during EMT. In case of the other processes the results are contradictory as the marked decrease in activity is followed by an instant activation.

Supplementary Figure 7: All possible node compartmentalization scenarios
A compartmentalized node (separated to two different nodes) could be represented in several ways, but there are 3 main underlying principles. In the first case there is a one-way communication between the compartmentalized nodes, either inhibitory or activatory, these are reflected in scenarios a, b, c, d. In these cases there's no direct feedback between the compartmentalized nodes.
Scenario e happens when the different pools of a protein are regulated separately and there's no direct interaction between those protein pools. In scenarios f, g, h and i there's direct feedback between the compartmentalized nodes and these could be either inhibitory or activatory or a combination of both. As highlighted with blue rectangles, nodes NOTCH, SMAD, SNAI1, TGFBR, ZEB1 were compartmentalized according to a, Ecadherin according to c, GSK3B according to e, AXIN2 and Bcatenin according to f and lastly GLI and SNAI2 according to g.  We've annotated each node of compartmentalized network to a signalling pathway in order to observe signalling activities during EMT. These signalling pathways are inactive at the beginning of EMT and they get activated during the transition. Their inactive state is marked in the last column.